Goto

Collaborating Authors

 onnx runtime


ANIRA: An Architecture for Neural Network Inference in Real-Time Audio Applications

Ackva, Valentin, Schulz, Fares

arXiv.org Artificial Intelligence

--Numerous tools for neural network inference are currently available, yet many do not meet the requirements of real-time audio applications. In response, we introduce anira, an efficient cross-platform library. T o ensure compatibility with a broad range of neural network architectures and frameworks, anira supports ONNX Runtime, LibT orch, and T ensorFlow Lite as backends. Each inference engine exhibits real-time violations, which anira mitigates by decoupling the inference from the audio callback to a static thread pool. The library incorporates built-in latency management and extensive benchmarking capabilities, both crucial to ensure a continuous signal flow. Three different neural network architectures for audio effect emulation are then subjected to benchmarking across various configurations. Statistical modeling is employed to identify the influence of various factors on performance. The findings indicate that for stateless models, ONNX Runtime exhibits the lowest runtimes. For stateful models, LibT orch demonstrates the fastest performance. Our results also indicate that for certain model-engine combinations, the initial inferences take longer, particularly when these inferences exhibit a higher incidence of real-time violations. In recent years, neural networks have become an integral part of modern audio digital signal processing. Their applications include audio classification [1], audio transcription [2], audio source separation [3], audio synthesis [4], [5], [6] and audio effects [7]. While offline processing is inherently supported, translating these architectures to real-time implementations remains challenging.


ONNX: The Standard for Interoperable Deep Learning Models

#artificialintelligence

The first time I heard about ONNX was during my internship at INRIA. I was working to develop Neural Network Pruning algorithms in the Julia language. There weren't many pre-trained models yet that I could use, so utilizing ONNX to import models developed with other languages and frameworks might have been a solution. In this article, I want to introduce ONNX and explain its enormous potential by also seeing a practical example. ONNX, or Open Neural Network Exchange, is an open-source standard for representing deep learning models. It was developed by Facebook and Microsoft in order to make it easier for researchers and engineers to move models between different deep-learning frameworks and hardware platforms.


Fast DistilBERT on CPUs

Shen, Haihao, Zafrir, Ofir, Dong, Bo, Meng, Hengyu, Ye, Xinyu, Wang, Zhe, Ding, Yi, Chang, Hanwen, Boudoukh, Guy, Wasserblat, Moshe

arXiv.org Artificial Intelligence

Transformer-based language models have become the standard approach to solving natural language processing tasks. However, industry adoption usually requires the maximum throughput to comply with certain latency constraints that prevents Transformer models from being used in production. To address this gap, model compression techniques such as quantization and pruning may be used to improve inference efficiency. However, these compression techniques require specialized software to apply and deploy at scale. In this work, we propose a new pipeline for creating and running Fast Transformer models on CPUs, utilizing hardware-aware pruning, knowledge distillation, quantization, and our own Transformer inference runtime engine with optimized kernels for sparse and quantized operators. We demonstrate the efficiency of our pipeline by creating a Fast DistilBERT model showing minimal accuracy loss on the question-answering SQuADv1.1 benchmark, and throughput results under typical production constraints and environments. Our results outperform existing state-of-the-art Neural Magic's DeepSparse runtime performance by up to 50% and up to 4.1x performance speedup over ONNX Runtime.


ML Tools to Accelerate your work with Cassie Breviu

#artificialintelligence

Want to ensure your app developers can create secure and smooth login experiences for your customers? With Curity you can protect user identities, secure apps and websites, and manage API access. Welcome to the InfoQ podcast. My name is Roland Meertens and today, I am interviewing Cassie Breviu. She is a senior program manager at Microsoft and hosted the innovations in machine learning systems track at QCon London. I am actually speaking to her in person at the venue of QCon London Conference. In this interview, I will talk with her on how she got started with AI and what machine learning tools can accelerate your work when deploying models on a wide range of devices. We will also talk about GitHub Copilot and how AI can help you be a better programmer. If you want to see her talk on how to operationalize transformer models on the edge, at the moment of recording this, you can still register for the QCon Plus Conference or see if the recording is already uploaded on infoq.com. Welcome, Cassie to QCon London. I'm very glad to see you here. I hope you're happy to be at this conference. I heard that you actually got into AI by being at the conference. I am thoroughly enjoying this conference. It's really put together really well and I really enjoy it. So what happened was I was at a developer conference. I was a full stack C# engineer and I'd always been really interested in AI and machine learning, but it always seemed scary and out of reach. I had even tried to read some books on it and I thought, "Well, this might be just too much for me or too complicated or I just can't do this." So I went to this talk by Jennifer Marsman and she did this amazing talk on, Would You Survive the Titanic Sinking? She used this product that's called Azure Machine Learning Designer.


Azure unlocks business opportunity with 5G and AIoT to drive digital transformation

#artificialintelligence

In the post pandemic era, global supply chains are restructured and dramatically changed. As industries around the world accelerate their digital transformation to create new business opportunity, the technologies of 5G and AIoT are becoming the key driving forces. The mainstream sectors such as healthcare, retail and manufacturing continue increase the market demands. SYNNEX, an industry leader in IT distribution, hosts 5G and AIoT on-line event named "High speed connection to enable smart applications" in December 2021. The invited keynote speakers are from Microsoft Taiwan, Intel and ITRI.


Accelerate and Productionize ML Model Inferencing Using Open-Source Tools

#artificialintelligence

You've finally got that perfect trained model for your data set. To run and deploy it to production, there's a host of issues that lie ahead. Performance latency, environments, framework compatibility, security, deployment targets…there are lots to consider! In this tutorial, we'll look at solutions for these common challenges using ONNX and related tooling. ONNX (Open Neural Network eXchange), an open-source graduate project under the Linux Foundation LF AI, defines a standard format for machine learning models that enables AI developers to use their frameworks and tools of choice to train, infer and deploy on a variety of hardware targets.


Tribuo: Machine Learning with Provenance in Java

Pocock, Adam

arXiv.org Machine Learning

Machine Learning models are deployed across a wide range of industries, performing a wide range of tasks. Tracking these models and ensuring they behave appropriately is becoming increasingly difficult as the number of deployed models increases. There are also new regulatory burdens for ML systems which affect human lives, requiring a link between a model and its training data in high-risk situations. Current ML monitoring systems often provide provenance and experiment tracking as a layer on top of an ML library, allowing room for imperfect tracking and skew between the tracked object and the metadata. In this paper we introduce Tribuo, a Java ML library that integrates model training, inference, strong type-safety, runtime checking, and automatic provenance recording into a single framework. All Tribuo's models and evaluations record the full processing pipeline for input data, along with the training algorithms, hyperparameters and data transformation steps automatically. The provenance lives inside the model object and can be persisted separately using common markup formats. Tribuo implements many popular ML algorithms for classification, regression, clustering, multi-label classification and anomaly detection, along with interfaces to XGBoost, TensorFlow and ONNX Runtime. Tribuo's source code is available at https://github.com/oracle/tribuo under an Apache 2.0 license with documentation and tutorials available at https://tribuo.org.


GitHub - graviraja/MLOps-Basics

#artificialintelligence

There is nothing magic about magic. The magician merely understands something simple which doesn't appear to be simple or natural to the untrained audience. Once you learn how to hold a card while making your hand look empty, you only need practice before you, too, can "do magic." Note: Please raise an issue for any suggestions, corrections, and feedback. The goal of the series is to understand the basics of MLOps like model building, monitoring, configurations, testing, packaging, deployment, cicd, etc.


BERT : A Machine Learning Model for Efficient Natural Language Processing

#artificialintelligence

BERT is a machine learning model that serves as a foundation for improving the accuracy of machine learning in Natural Language Processing (NLP). Pre-trained models based on BERT that were re-trained on big data to solve a variety of domain-specific NLP tasks are publicly available (BioBERT for biomedical text, SciBERT for scientific publications, ClinicalBERT for clinical notes). This is a task to predict masked words. It can be used to proofread sentences. BERT-based model to perform named entity recognition from text.


ONNX Runtime on Azure Kubernetes Service -- ONNX Runtime 1.9.99 documentation

#artificialintelligence

Throughout this tutorial, we will be referring to ONNX, a neural network exchange format used to represent deep learning models. With ONNX, AI developers can more easily move models between state-of-the-art tools (CNTK, PyTorch, Caffe, MXNet, TensorFlow) and choose the combination that is best for them. ONNX is developed and supported by a community of partners including Microsoft AI, Facebook, and Amazon. For more information, explore the ONNX website and open source files. ONNX Runtime is the runtime engine that enables evaluation of trained machine learning (traditional ML and Deep Learning) models with high performance and low resource utilization.